library(phenopix)
library(jpeg)
library(rgdal) # seems to be necessary for `extractVIs` function, but Kelsey hasn't figured out why yet
library(zoo)
library(tseries)
library(reshape2)
library(strucchange)
library(ggplot2)
library(cowplot)
# To run the code in the console, set the working directory. To knit, do NOT set the working directory as the root directory should be set directly in the knitr options (code chunk #1)
setwd("/Users/elwoodk/Google_Drive/courses/earth-analytics/final-project/phenocam/RGB/")

This tutorial is built primarily based on the Vignette for phenopix available with the code: vignette("base", package = "phenopix")

Structuring a folder tree useful for the analysis

Giving a good structure to your analysis can make all subsequent steps simple and straightforward. If you are running a site that records images you will be dealing with quite heavy folders (with likely multiple years of data, hence some thousand files of images) that you need to handle with care. We suggest separate folders for each site (of course) but also year of analysis. Each year folder should contain a sub-folder with all images to be processed (/IMG), one folder containing the reference image, i.e. the image you will use to draw your ROI (/REF), one folder containing data for the region of interest (/ROI) and one folder containing extracted vegetation indexes (/VI). The function structureFolder() provides a facility to create appropriate sub-folders. To use, set the working directory to be the folder that is within the site, within the year.

In the example below, we are only using one year of data, but have 2 types of images: RGB and RGB + IR. To start, let’s just use the RGB images, so we will set the working directory to the phenocam/RGB folder. In this folder, the file structure is as described above, with four subfolders: /IMG, /REF, /ROI, and /VI. This data is from the “harvardlph” PhenoCam data in Harvard Forest, MA from April 1, 2016 to December 1, 2016 with 30-minute daily capture from 10:00 AM to 2:00 PM. The data is publicly available from https://phenocam.sr.unh.edu/. I have set the working directory to be the RGB folder.

# Use the code below to create the sub-folders recommended by the `phenopix` package
# You only need to run this code once
my.path <- structureFolder(path = getwd(), showWarnings = FALSE)
str(my.path)

Drawing a region of interest (ROI)

Apart from structuring folders, drawing an ROI is the first, hence most important step of the analysis. The procedure is based on two steps: first, a reference image (chosen by the user) is plotted by calling function readJPEG() from package jpeg and rasterImage(). In Fig. 1 is the reference image from the harvardlph site and the code used to plot the image. We first define an easy plotting function to print on screen images.

.plotImage <- function(image, ...) {
    ncols <- ncol(image)
    nrows <- nrow(image)
    suppressWarnings(plot(0,
        type = "n", xlim = c(0, ncols), ylim = c(0, nrows), ...))
    suppressWarnings(rasterImage(image, 
                                 xleft = 0, 
                                 ybottom = 0, 
                                 xright = ncols, 
                                 ytop = nrows, ...))
}

img <-jpeg::readJPEG("REF/harvardlph_2016_07_15_120007.jpg") # this is the only image in the REF folder. It was chosen and put into the `/REF` folder manually using the finder.
.plotImage(img)

Figure 1: A jpeg image printed on a graphic device using readJPEG() and rasterImage() embedded in the .plotImage() function

To draw the ROI, use the following code:

DrawROI(path_img_ref = "REF/harvardlph_2016_07_15_120007.jpg", # the folder of your reference image
        path_ROIs = "ROI/", # the path in your computer where to store RData with ROI properties
        nroi = 1, # number of ROIs (for two ROIs, you can use concatenate)
        roi.names = c("canopy"), # list of ROI names (in order)
        file.type = ".jpg") 

A call to the function opens a graphic device and allows the use of locator() to define your ROI(s). Note that the use of locator is somewhat system specific. Check out the help file ?locator for more details. Locator allows the user to draw a polygon by left-clicking vertices and then right-clicking (or press ESC on MacOS) to close the polygon. If you have chosen more than one ROI, after closing your first polygon, the image will appear again unmodified to draw the second ROI, and so on. Note that the plot title includes the name of the ROIs you are drawing. When you are done, in your ROIs folder an .RData file called roi.data.RData and a .jpg file of each ROI will be stored. The RData file is actually a list with the following structure:

load('ROI/roi.data.Rdata')
str(roi.data)
## List of 1
##  $ canopy:List of 2
##   ..$ pixels.in.roi:'data.frame':    1244160 obs. of  3 variables:
##   .. ..$ rowpos: num [1:1244160] 0.000772 0.001543 0.002315 0.003086 0.003858 ...
##   .. ..$ colpos: num [1:1244160] 0.000772 0.000772 0.000772 0.000772 0.000772 ...
##   .. ..$ pip   : int [1:1244160] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ vertices     :List of 2
##   .. ..$ x: num [1:6] 0.252 0.717 0.992 0.683 0.325 ...
##   .. ..$ y: num [1:6] 0.0452 0.0521 0.5416 0.6101 0.6015 ...

There are 2 elements list for each ROI. Each element is again a list containing two elements. One is a data.frame containing coordinates of all image pixels, together with a code indicating whether the given pixel belongs to the ROI or not. The second is a list with the coordinates of ROI margins as in output from locator().

In the ROIs folder, separate jpeg files for each of your regions of interest are stored. A call to the function printROI() allows to plot in the same graph all existing ROIs for a picture. In the example from Harvard, only one ROI was drawn. Here is the code to generate the plot in fig. 2:

# Using arguments in the vignette
PrintROI(path_img_ref = 'REF/harvardlph_2016_07_15_120007.jpg',
         path_ROIs = 'ROI/',
         which = 'all',
         col = palette())

Figure 2: A plot of the regions of interest (ROIs), in output from PrintROI()

NOTE: When you draw an ROI on your best quality image (say 640 x 428 pixels) you will probably need to identify the same ROI in smaller size images. This will be the case, for example, if you want to conduct a pixel-based analysis, illustrated later on. Pixel-based analysis is computationally intense and therefore it is suggested to run it on rather small size images. The function updateROI() allows to recalculate pixels falling within a given ROI in images of different size compared to the one where the ROI was first drawn.

Usage is…

# img2 <-jpeg::readJPEG("REF/harvardlph_2016_07_15_120007.jpg")
# updateROI(roi.data, img2)

… where old.roi is the original roi.data object, new.img is the re-sized image. A new object with same structure as the original roi.data is returned.


*** This is where I have finished reviewing/editing to be relevant to the harvard data set

Extraction of vegetation indexes

First, let’s look at the construction of the time series. The construction of the time series implies that R recognizes a time vector, typically retrieved from the file name of each picture. The function responsible for this conversion is extractDateFilename(). It is a rather internal function but it is worthwhile to look how it works to properly set the filenames of your imagery archive. Arguments to the function are filename and date.code. Filename must be a character string with an underscore “_" that separates site name and date (e.g. ‘name_date’ OR ‘harvardlph_2014_07_15_120007.jpg’). The format of your date must be provided in date.code. In the example above, date.code will be: ‘yyyy_mm_dd_HHMM’. Let’s look at some examples, but before doing so, it is worthwhile to remember the the file naming system is under your responsibility when you set up the storage process for your images, or by some renaming routines set up later.

extractDateFilename(filename = "harvardlph_2016_07_15_120007", 
                    date.code = "yyyy_mm_dd_HHMM")
## [1] "2016-07-15 12:00:00 MDT"

Once we have confirmed that the date code effectively extracts the right date/time information, we can start the extraction of the vegetation indices!

At this point, you have an r object stored as roi.data.Rdata in your ROI path that defines which pixels fall into one or more ROIs. The next step will be to extract information on those pixels from each of your images. The function that performs this task is extractVIs(). The usage of some of the arguments are outlined below:

extractVIs(img.path = "", 
           roi.path = "", 
           vi.path = "", 
           roi.name = c("ROI1, ROI2, etc.), 
           plot = TRUE, 
           begin = "YYYY-MM-DD"
           spatial = FALSE, 
           date.code = "yyyy_mm_dd_HHMM",
           npixels = , 
           file.type = " ", 
           bind = , 
           ncores = 1)
extractVIs(img.path = "IMG/", 
           roi.path = "ROI/", 
           vi.path = "VI/", 
           roi.name = "canopy", 
           plot = TRUE, # return a plot with R, G, and B indexes
           spatial = FALSE, # VI is averaged over the entire ROI (and not for each pixel individually)
           date.code = "yyyy_mm_dd_HHMM", # harvardlph date code structure embedded in file names
           npixels = 1,
           file.type = ".jpg", 
           bind = TRUE, 
           log.file = "VI/", 
           begin = NULL)

Now let’s look from closer at the structure of the object VI.data saved in your /VI directory. (Old note: I had some issues with incomplete extraction of images, particularly if the images were of different sizes (pixel width x length). It is possible that updateROI would help with this issue, but I just downsized my large images to match my smaller images and that seemed to work.)

load("VI/VI.data.Rdata")
summary(VI.data) # a list with data.frames, one for each ROI
##        Length Class      Mode
## canopy 18     data.frame list
names(VI.data[[1]]) # the list of variables (including color indices) extracted from each image
##  [1] "date"   "doy"    "r.av"   "g.av"   "b.av"   "r.sd"   "g.sd"  
##  [8] "b.sd"   "bri.av" "bri.sd" "gi.av"  "gi.sd"  "gei.av" "gei.sd"
## [15] "ri.av"  "ri.sd"  "bi.av"  "bi.sd"
# View(VI.data$canopy) # view the dataframe for the "canopy" ROI

The processing of each ROI produces a data.frame object with date in POSIX format, numeric day of year (doy), and the vegetation indexes. Green, red and blue digital numbers (range [0,255]) averaged over the ROI (g.av, r.av and b.av, respectively), their standard deviations (g.sd, r.sd and b.sd). bri.av is the ROI averaged brightness, calculated as the sum of red, green, and blue digital numbers for each pixel and then averaged. From the digital numbers (dn) of each color, relative indexes (rel.i) are calculated as follows:

rel.i = \(\frac{dn_{color}}{(dn_{red} + dn_{green} + dn_{blue})}\)

These values are calculated for each pixel and then averaged over the entire ROI (columns gi.av, ri.av, bi.av), and the standard deviation is calculated as well. In fig.3 you can see how the seasonal course of raw color digital numbers looks:

par(mfrow = c(1,1))
with(VI.data$canopy, plot(date, 
                          r.av, 
                          pch=20, 
                          col='red', 
                          ylim=c(0,255), 
                          xlab = "Date",
                          ylab= "DN [0,255]"))
with(VI.data$canopy, points(date, 
                            g.av, 
                            col='green', 
                            pch=20))
with(VI.data$canopy, points(date, 
                            b.av, 
                            col='blue', 
                            pch=20))
ggplot(data = VI.data$canopy, aes(x = date)) +
    geom_point(aes(y = r.av), color = "firebrick") + 
    geom_point(aes(y = g.av), color = "forestgreen") +
    geom_point(aes(y = b.av), color = "blue3") + 
    labs(title ="Plot of  (Raw, Unfiltered)",
         x = "Date", 
         y = "C") + 
    guides(legend) + 
    theme_bw()

Figure 3: Seasonal course of raw digital numbers, Harvard Forest, year 2016

More interesting is the plot of relative indexes (fig. 4):

with(VI.data$canopy, plot(date, 
                          ri.av, 
                          pch=20, 
                          col='red', 
                          ylim=c(0.1,0.6), 
                          ylab='Relative indexes', 
                          xlab = "Date",
                          main = "Figure 4: Relative Indices"))
with(VI.data$canopy, points(date, 
                            gi.av, 
                            col='green', 
                            pch=20))
with(VI.data$canopy, points(date, 
                            bi.av, 
                            col='blue', 
                            pch=20))
ggplot(data = VI.data$canopy, aes(x = date)) +
    geom_point(aes(y = ri.av), color = "firebrick") + 
    geom_point(aes(y = gi.av), color = "forestgreen") +
    geom_point(aes(y = bi.av), color = "blue3") + 
    labs(title ="Relative Color Indices (Raw, Unfiltered)",
         x = "Date", 
         y = "Color Index Value") + 
    guides(legend) + 
    theme_bw()

Figure 4: Seasonal course of relative green red and blue indexes, Harvard forest, year 2016

Filter out data

Data retrieved from images often need robust methods for polishing the time series. Bad weather conditions, low illumination, dirty lenses are among the most common issues that determine noise in the time series of vegetation indexes. Accordingly, phenopix has a function autoFilter() based on 4 different approaches. There are multiple filters to use: “night”, “blue”, “mad”, “max”, and “spline”.

Filters are applied in the order listed in the argument. Night filter removes records under a certain gcc value (as specified in filter.options). The default is 0.2. Blue filter is intended to remove bad images and is very aggressive. It is suggested only for very low quality images. The daily mean and standard deviation on bcc is computed and a sd threshold is computed as the quantile of standard deviations with prob = 0.05. An envelope is then computed as daily mean bcc +/- the calculated threshold. Raw data outside this envelope are discarded. The mad filter is applied according to Papale et al 2006 (biogeosciences) created to remove spikes on FLUXNET data. The max filter is based on Sonnentag et al (2012) and computes the 90% of the curve based on a three days moving window. The spline filter is based on Migliavacca et al (2011).

The function is designed to receive in input a data.frame structured as in output from extractVIs, hence its default expression may appear rather complicated.

Note that the function autoFilter is unsuccessful when there are duplicate rows. To avoid this problem, Kelsey (with the help of Caitlin White) added the function unique() before the data.frame to avoid problems.

VI.data.veg <- as.data.frame(VI.data$canopy)
VI.data.veg2 <- unique(VI.data$canopy)
filtered.data <- autoFilter(data = unique(VI.data$canopy), 
                            dn=c('ri.av', 'gi.av', 'bi.av'), 
                            brt = 'bri.av', 
                            filter = c("night", "spline", "max"), 
                            na.fill = TRUE)

Figure 5: Raw and filtered relative greenness index, default plot of function autoFilter()

str(filtered.data)
## 'zoo' series from 2016-04-01 to 2016-12-01
##   Data: num [1:243, 1:7] 0.408 0.429 0.395 0.351 0.407 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:7] "rcc" "gcc" "bcc" "brt" ...
##   Index:  POSIXct[1:243], format: "2016-04-01" "2016-04-02" "2016-04-03" "2016-04-04" "2016-04-05" ...
str(VI.data)
## List of 1
##  $ canopy:'data.frame':  2177 obs. of  18 variables:
##   ..$ date  : POSIXct[1:2177], format: "2016-04-01 10:00:00" ...
##   ..$ doy   : num [1:2177] 92 92 92 92 92 92 92 92 92 93 ...
##   ..$ r.av  : num [1:2177] 123 123 122 123 121 ...
##   ..$ g.av  : num [1:2177] 97.3 99.5 98.2 102.9 97 ...
##   ..$ b.av  : num [1:2177] 82 85 84.1 90.8 82 ...
##   ..$ r.sd  : num [1:2177] 21.9 23.9 22.7 27.2 22.7 ...
##   ..$ g.sd  : num [1:2177] 20.8 24.6 23.8 30.3 22.8 ...
##   ..$ b.sd  : num [1:2177] 22.4 26.9 26.5 33.9 25 ...
##   ..$ bri.av: num [1:2177] 302 308 304 317 300 ...
##   ..$ bri.sd: num [1:2177] 62.7 73.3 71.2 90 68.3 ...
##   ..$ gi.av : num [1:2177] 0.322 0.323 0.322 0.324 0.323 ...
##   ..$ gi.sd : num [1:2177] 0.01135 0.01072 0.01024 0.00943 0.01115 ...
##   ..$ gei.av: num [1:2177] -10.17 -9.36 -9.41 -8.5 -8.85 ...
##   ..$ gei.sd: num [1:2177] 8.79 8.25 7.73 7.15 8.37 ...
##   ..$ ri.av : num [1:2177] 0.411 0.406 0.406 0.397 0.408 ...
##   ..$ ri.sd : num [1:2177] 0.0331 0.0351 0.0356 0.0361 0.0353 ...
##   ..$ bi.av : num [1:2177] 0.268 0.271 0.272 0.28 0.269 ...
##   ..$ bi.sd : num [1:2177] 0.0299 0.0314 0.0317 0.0321 0.0317 ...
plot(VI.data.veg)

In the structure of the output data.frame of filtered.data there are three important points:
- We introduce here a new class of R objects (zoo). From here on all further analyses are based on zoo (or, to a lesser extent ts) time series. The time index of the data is numeric day of year (doy). As a consequence, the attribute year is lost at this step of the analysis (we suggest to include it in the object name)
- The function autoFilter aggregates the data at a daily time step by default. The returned data.frame contains unfiltered (but still daily aggregated) color indexes (here called gcc, rcc, and bcc, cc standing for chromatic coordinate) and a column of data for each filtering step. The name of the filter applied is reported in the column name.
- The argument na.fill defaults to TRUE, meaning that NA already existing in the VI.data (unlikely) or data discarded by the filtering procedure (much more likely) are filled by linear approximation (using na.approx from zoo pack- age. This is done because the subsequent fitting step requires no NA appearing in the time series. If a user wants to have control on the discarded data and e.g. customize the gap-filling we recommend setting na.fill to FALSE. For those unfamiliar with the zoo structure we created a function convert to convert from zoo to a normal data.frame:

dataframed <- convert(filtered.data, year='2017')

However, we strongly recommend to get familiar with the zoo package since it has wonderful facilities for plotting, aggregating and filling time series. Filters are based on methods relying on different parameters that can be tuned by the user (called filter options). A function allows to return default filter options that can be in turn changed.

my.options <- get.options()
names(my.options) # a named list, one element for each filter
## [1] "night.filter"  "blue.filter"   "mad.filter"    "max.filter"   
## [5] "spline.filter"
## see help file for the meaning
my.options$max.filter$qt <- 0.95 # use 95th percentile instead of 90th for max.filter
filtered.data2 <- autoFilter(unique(VI.data$canopy), filter.options=my.options, plot=FALSE)
plot(filtered.data$max.filtered) ## default options
lines(filtered.data2$max.filtered, col='red') # customized options
legend('topleft', col=palette()[1:2], lty=1, legend=c('90th', '95th'), bty='n')

Figure 6: Effect (not that large indeed) of changing filter options with function autoFilter()

Fit a curve to the data

The seasonal trajectory of greenness index of a vegetation canopy provides per se important information, but to turn qualitative information into quantitative data we need to make some more computation. Traditionally, data similar to these (e.g. satellite-based NDVI trajectories) are processed in two main ways:

In the package phenopix both possibilities are available. The core function for data fitting and phenophase extraction is greenProcess(). This function calls and is related to several rather internal functions that perform the different fittings. Available fittings include:

All fits are based on a double - logistic function with a different number of parameters.

After curve fitting, relevant dates in the seasonal trajectory (aka phenophases) are extracted with different methods:
- A method called trs which splits the seasonal course into increasing and decreasing trajectory based on the sign of the first derivative and then identifies a given threshold (by default the 50%) of both the increasing and decreasing trajectory. It allows to determine start of season (sos), end of season (eos) and length of season (los) as the difference between the two.

Detail on curve fitting and phenophase extraction is provided in the help function of ?greenProcess as well as in the help files of other more internal functions such as ?KlostermanFit, ?GuFit, ?PhenoExtract. In fig.6 we show 4 different fitting methods applied to the same data (Harvard Forest). But let’s first have a look at the arguments of greenProcess:

ts is the zoo time series in input. It must be a time series with no NA. Arguments fit and threshold allows to choose the fitting and phenopahse methods, respectively. plot is a logical determining if a plot is returned or not, which is pertinent only if fit = ‘klosterman’, uncert is a logical for uncertainty computation, for which number of replicates is controlled by nrep. envelope and quantiles will be detailed later. hydro is a logical indicating whether days must be converted to hydrodays before the analysis, where october 1st will be doy 1 and so on (designed for southern hemisphere or for winter-growing plants). Since phenopix version > 2.0 the uncertainty estimation benefits from parallelization, for which arguments ncores controls the number of cores used in parallel computation, default is ‘all’ and the actual number of cores you want to use can be set with an integer. Parallelization is performed by calling function foreach in the foreach package.

## spline curve + trs phenophases
fit1 <- greenProcess(ts = filtered.data$max.filtered, 
                     fit = 'spline', 
                     threshold = 'trs', 
                     plot=FALSE)
summary(fit1)
## 
## Data
##      Index          observed     
##  Min.   : 92.0   Min.   :0.2946  
##  1st Qu.:152.5   1st Qu.:0.3229  
##  Median :213.0   Median :0.3879  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.3977  
##  Max.   :336.0   Max.   :0.4258  
## 
## Predicted
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3095  
##  1st Qu.:152.5   1st Qu.:0.3233  
##  Median :213.0   Median :0.3898  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.3976  
##  Max.   :336.0   Max.   :0.4174  
## 
## Formula
## NULL
## 
## Thresholds
##         sos         eos         los         pop         mgs         rsp 
## 138.0000000 286.0000000 148.0000000 160.0000000   0.3964404          NA 
##         rau        peak         msp         mau 
##          NA   0.4173794   0.3664767   0.3634738
## Beck fitting + derivatives
fit2 <- greenProcess(ts = filtered.data$max.filtered, 
                     fit = 'beck', 
                     threshold = 'derivatives', 
                     plot=FALSE)

## klosterman fitting + klosterman phenophases
fit3a <- greenProcess(ts = filtered.data$max.filtered, 
                     fit = 'klosterman', 
                     threshold = 'klosterman',
                     uncert = FALSE, # Increases computation time
                     nrep = 100, # For this data set, nrep = 100 takes about 8 minutes
                     which = "heavy", # "This argument performs an optimization similar in concept to the one performed in FitDoubleLogKlLight but with additional recursive optimization which is about 10 times more time consuming but allows for a better representation of the data. It it suggested to fit the light version of the equation and if the fit is not good enought, check this function out."
                     plot=FALSE)
summary(fit3a)
## 
## Data
##      Index          observed     
##  Min.   : 92.0   Min.   :0.2946  
##  1st Qu.:152.5   1st Qu.:0.3229  
##  Median :213.0   Median :0.3879  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.3977  
##  Max.   :336.0   Max.   :0.4258  
## 
## Predicted
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.2971  
##  1st Qu.:152.5   1st Qu.:0.3292  
##  Median :213.0   Median :0.3806  
##  Mean   :213.5   Mean   :0.3671  
##  3rd Qu.:274.5   3rd Qu.:0.4031  
##  Max.   :336.0   Max.   :0.4101  
## 
## Formula
## expression((a1 * t + b1) + (a2 * t^2 + b2 * t + c) * (1/(1 + 
##     q1 * exp(-B1 * (t - m1)))^v1 - 1/(1 + q2 * exp(-B2 * (t - 
##     m2)))^v2))
## 
## Thresholds
##    Greenup   Maturity Senescence   Dormancy 
##        111        158        253        336
## gu fitting and phenophases
fit4 <- greenProcess(ts = filtered.data$max.filtered, 
                     fit = 'gu', 
                     threshold = 'gu', 
                     plot=FALSE)
summary(fit4)
## 
## Data
##      Index          observed     
##  Min.   : 92.0   Min.   :0.2946  
##  1st Qu.:152.5   1st Qu.:0.3229  
##  Median :213.0   Median :0.3879  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.3977  
##  Max.   :336.0   Max.   :0.4258  
## 
## Predicted
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3141  
##  1st Qu.:152.5   1st Qu.:0.3231  
##  Median :213.0   Median :0.3888  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.4033  
##  Max.   :336.0   Max.   :0.4046  
## 
## Formula
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t - 
##     t02)/b2))^c2))
## 
## Thresholds
##            UD            SD            DD            RD       maxline 
##  131.99385793  143.09621465  280.49413430  304.18645354    0.87663946 
##      baseline           prr           psr plateau.slope 
##   -0.02100819    0.08085199   -0.03237116   -0.00120529
## elmore
## gu fitting and phenophases
fit5 <- greenProcess(ts = filtered.data$max.filtered, 
                     fit = 'elmore', 
                     threshold = 'klosterman', 
                     plot=FALSE)
summary(fit5)
## 
## Data
##      Index          observed     
##  Min.   : 92.0   Min.   :0.2946  
##  1st Qu.:152.5   1st Qu.:0.3229  
##  Median :213.0   Median :0.3879  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.3977  
##  Max.   :336.0   Max.   :0.4258  
## 
## Predicted
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3218  
##  1st Qu.:152.5   1st Qu.:0.3218  
##  Median :213.0   Median :0.3856  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.4009  
##  Max.   :336.0   Max.   :0.4151  
## 
## Formula
## expression(m1 + (m2 - m7 * t) * ((1/(1 + exp(((m3/m4) - t)/(1/m4)))) - 
##     (1/(1 + exp(((m5/m6) - t)/(1/m6))))))
## 
## Thresholds
##    Greenup   Maturity Senescence   Dormancy 
##        132        149        284        301
par(mfrow = c(3,2))

plot(fit1, 
     type='p', 
     pch=20, 
     col='grey',
     xlab = "DOY")

summary(fit2)
## 
## Data
##      Index          observed     
##  Min.   : 92.0   Min.   :0.2946  
##  1st Qu.:152.5   1st Qu.:0.3229  
##  Median :213.0   Median :0.3879  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.3977  
##  Max.   :336.0   Max.   :0.4258  
## 
## Predicted
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3221  
##  1st Qu.:152.5   1st Qu.:0.3244  
##  Median :213.0   Median :0.4056  
##  Mean   :213.5   Mean   :0.3760  
##  3rd Qu.:274.5   3rd Qu.:0.4097  
##  Max.   :336.0   Max.   :0.4097  
## 
## Formula
## expression(mn + (mx - mn) * (1/(1 + exp(-rsp * (t - sos))) + 
##     1/(1 + exp(rau * (t - eos)))))
## 
## Thresholds
##           sos           eos           los           pop           mgs 
## 138.000000000 289.000000000 151.000000000 185.000000000   0.405554965 
##           rsp           rau          peak           msp           mau 
##   0.007890304  -0.003397248   0.409702205   0.366088410   0.364331237
plot(fit2, 
     type='p', 
     pch=20, 
     col='grey')

plot(fit3a, 
     type = 'p', 
     pch = 20, 
     col = 'grey')

plot(fit4, 
     type='p', 
     pch=20, 
     col='grey')

plot(fit5, 
     type='p', 
     pch=20, 
     col='grey')

Figure 6: Comparison of curve fittings and phenophase calculation methods (aka “thresholds”.)

## show all together
par(mfrow = c(1,1))
t <- as.numeric(format(index(filtered.data$max.filtered), '%j'))
par(lwd=3)
plot(t, dataframed$max.filtered, type='p', pch=20, ylab='Green chromatic coordinate', xlab='DOYs')
lines(fitted(fit1), col='blue')
lines(fitted(fit2), col='red')
lines(fitted(fit3a), col='green')
lines(fitted(fit4), col='violet')
legend('topleft', col=c('blue', 'red', 'green', 'violet'),
lty=1, legend=c('Spline', 'Beck', 'Klosterman', 'Gu'), bty='n')

Figure 7: Comparison of 4 different fittings from phenopix package

The function greenProcess creates an object of class phenopix with dedicated methods. The summary function displays a summary of the input data and of the predicted points. It then reports the formula of the fitting equation, if pertinent, see e.g. summary of fit1 which is not based on an equation.

Phenophases are printed as well. Note also the fitted function applied to phenopix object that returns a zoo time series of fitted values that can be directly lined to the plot.

To complete the overview on display generic methods applied to the objects of class phenopix here is the application of generic plot (fig.8) and print functions:

The plot function (below) returns a plot similar to the one constructed above, except that extracted phenophases are also shown the as vertical colored lines.

plot(fit4, 
     pch=20, 
     col='grey', 
     type='p', 
     xlab='DOYs', 
     ylab='Green chromatic coordinates')

Figure 8: Generic plot function applied to phenopix objects

The print function (below) returns information similar to summary but it also reports which fitting and phenophase methods were used, and if the uncertainty was estimated.

Fig.5 shows that different fitting equation lead to very similar fitted values on the example from (Harvard Forest) data. For the sake of robustness, in such situation it is preferable to choose a fitted equation rather than a spline fit. Let’s decide to choose the fitting from Gu. Now let’s look from closer how do the different phenophase extraction methods impact when applied to the same fitted curve in fig.9 (and note the use of update generic function with method phenopix)

print(fit4)
## 
##  #### phenopix time series processing ####
## 
## FITTING: GU
## 
## PREDICTED VALUES:
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3141  
##  1st Qu.:152.5   1st Qu.:0.3231  
##  Median :213.0   Median :0.3888  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.4033  
##  Max.   :336.0   Max.   :0.4046  
## 
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t - 
##     t02)/b2))^c2))
## 
## FITTING PARAMETERS:
##           y0           a1           a2          t01          t02 
##   0.06777627   0.81096539   0.89974985 126.24936560 303.55806891 
##           b1           b2           c1           c2 
##   3.59347356   1.27076400  19.21057048   0.05710031 
## 
## THRESHOLDS: GU
##            UD            SD            DD            RD       maxline 
##  131.99385793  143.09621465  280.49413430  304.18645354    0.87663946 
##      baseline           prr           psr plateau.slope 
##   -0.02100819    0.08085199   -0.03237116   -0.00120529 
## 
## UNCERTAINTY: FALSE
##  N of replications = 0 
## 
## HYDROLOGICAL DAY OF YEAR: FALSE
fit4.trs <- update(fit4, 'trs', plot=FALSE)
fit4.klosterman <- update(fit4, 'klosterman', plot=FALSE)
fit4.gu <- update(fit4, 'gu', plot=FALSE)
par(mfrow=c(2,2), oma=rep(5,4,4,2), mar=rep(0,4))
plot(fit4.trs, type='n', main='', xaxt='n')
mtext('trs', 3, adj=0.1, line=-2)
plot(fit4.klosterman, type='n', main='', xaxt='n', yaxt='n')
mtext('klosterman', 3, adj=0.1, line=-2)
plot(0, type='n', axes=FALSE, xlab='', ylab='')
plot(fit4.gu, type='n', main='', yaxt='n')
axis(4)
mtext('gu', 3, adj=0.1, line=-2)

Figure 9: Three phenophase methods applied to the Gu fitting

The trs thresholds (50% of increasing and decreasing trajectory) hold a different meaning compared to Klosterman and Gu phenophases. The latter two show good correspondence except that the Klosterman s beginning of senescence occurs later compared to correspondent phase in Gu thresholds (i.e DD, downturn date).

We have shown 4 different approaches to mathematically describe the seasonal trajectory of greenness, with additionally 5 methods to extract phenophases on the obtained curves. The combination of curves and phenophase methods leads to as many as 20 possible approaches to describe a seasonal trajectory. Sometimes it could be useful to make a decision on which curves and phenophases to use, without computing the uncertainty on all of them. To do so we have designed two functions that provide a quick overview on what would be the best fit and phenophase method for your actual trajectory. Here is how to compute the 20 combinations of fit and uncertainty in a single function:

explored <- greenExplore(filtered.data$max.filtered)
## [1] "Fitting spline 1/5"
## [1] "Fitting Beck 2/5"
## [1] "Fitting Elmore 3/5"
## [1] "Fitting Klosterman 4/5"
## [1] "Fitting Gu 5/5"

explored is a list with 20 + 1 elements, i.e. the 20 combinations + a vector containing the RMSEs from each of the 4 fittings. This object will only be used as argument of the plotExplore() function (fig.10):

plotExplore(explored)

Figure 10: Overview of all combinations of curves and fits as obtained by the plotExplore function

The plot in fig.10 shows the impact of different fittings (moving up-downwards) and different phenophases (from left to right) on the same data (Harvard Forest). The RMSE (root-mean-square error) for each of the four fitting methods is also annotated in the first column. Lower error values represent smaller differences between the modeled value and the actual values (i.e. residuals) and therefore lower RMSE values represent better model fits. This plot might be useful to choose the most appropriate fitting and thresholding methods on your data.

greenProcess is a wrapper function that allows the user to define the fitting and phenophase methods as arguments. The “primitive” functions that actually perform the fits are the following: BeckFit, ElmoreFit, KlostermanFit and so on. Their usage is generally:

args(ElmoreFit)
## function (ts, uncert = FALSE, nrep = 100, ncores = "all", sf = quantile(ts, 
##     probs = c(0.05, 0.95), na.rm = TRUE)) 
## NULL

with the most important argument beeing ts, the time series. Compared to using greenProcess, the single fitting functions have the advantage to allow more flexibility but in general the user won’t need to use them.

The phenophase extraction methods also have a dedicated wrapper function already embedded in the greenProcess() function, PhenoExtract() which usage is:

args(PhenoExtract)
## function (data, method = "trs", uncert = FALSE, breaks = 3, envelope = "quantiles", 
##     quantiles = c(0.1, 0.9), plot = TRUE, sf, ...) 
## NULL

where the argument method allows to choose the phenophase method. Note that input data in this case should be a fitted time series in output from e.g. FitDoubleLogElmore and not a phenopix object in output from greenProcess. Here is an example:

# Both method 1 and 2 should produce the same values

# Method 1
fit.elmore <- greenProcess(filtered.data$max.filtered, 'elmore', 'trs', plot=FALSE)
extract(fit.elmore, 'metrics')
##         sos         eos         los         pop         mgs         rsp 
##          NA 290.0000000          NA 145.0000000   0.3973447          NA 
##         rau        peak         msp         mau 
##          NA   0.4150725          NA   0.3575157
# Method 2
fit.elmore.2 <- ElmoreFit(filtered.data$max.filtered)
PhenoExtract(fit.elmore.2, 'trs', plot=FALSE)
## $metrics
##         sos         eos         los         pop         mgs         rsp 
##          NA 290.0000000          NA 145.0000000   0.3973447          NA 
##         rau        peak         msp         mau 
##          NA   0.4150725          NA   0.3575157 
## 
## $unc.df
## NULL

The uncertainty estimation

One main functionality of the package is the uncertainty estimation. This is performed in different ways depending on the fitting equation. The basic idea behind the uncertainty estimation is how good the smoothing curve fits to the data. The residuals between fitted and observed is therefore used to generate random noise to the data and fitting is applied recursively to randomly-noised original data. This procedure results in an ensemble of curves, curve parameters and extracted phenophases that represent the uncertainty estimate. The uncertainty on curve parameters is automatically propagated to phenophase extraction. In the following example the uncertainty estimation is performed on Harvard Forest data fitted with the approach of Gu et al. (2009), with 100 replications. Here is the code:

# Takes about 1 minute to compute:
fit.complete <- greenProcess(ts = filtered.data$max.filtered, 
                             fit = 'gu', 
                             threshold= 'gu', 
                             plot = FALSE, 
                             uncert = TRUE, 
                             nrep = 100)
## [1] "estimated computation time (8 cores): 1 mins"

And here is fit.complete printed:

print(fit.complete)
## 
##  #### phenopix time series processing ####
## 
## FITTING: GU
## 
## PREDICTED VALUES:
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3141  
##  1st Qu.:152.5   1st Qu.:0.3231  
##  Median :213.0   Median :0.3888  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.4033  
##  Max.   :336.0   Max.   :0.4046  
## 
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t - 
##     t02)/b2))^c2))
## 
## FITTING PARAMETERS:
##           y0           a1           a2          t01          t02 
##   0.06777627   0.81096539   0.89974985 126.24936560 303.55806891 
##           b1           b2           c1           c2 
##   3.59347356   1.27076400  19.21057048   0.05710031 
## 
## THRESHOLDS: GU  ENVELOPE:QUANTILES
##                UD            SD       DD       RD   maxline      baseline
## 10% -9.907452e+17 -1.827194e+19 198.0000 210.6268 0.8108180 -0.0188717161
## 50%  1.335436e+02  1.449789e+02 198.0000 211.5403 0.8155323 -0.0120058988
## 90%  6.032941e+17  2.946881e+19 285.6357 299.4923 0.8387973 -0.0002047714
##           prr         psr plateau.slope
## 10% 0.0000000 -0.06620943 -0.0008683386
## 50% 0.0000000 -0.06190401 -0.0006223380
## 90% 0.0851251 -0.05191868  0.0108112489
## 
## UNCERTAINTY: TRUE
##  N of replications = 100 
## 
## HYDROLOGICAL DAY OF YEAR: FALSE

As you can see from the output, the default behavior of greenProcess() for the computation of uncertainty is to provide the median, 10th and 90th percentile of the uncertainty ensemble. This may be changed by modifying the envelope argument. The other possible option is min-max to get minimum mean and maximum. In addition, the quantiles to be chosen with envelope = quantiles can be changed by modifying the quantile argument. Here is the example:

print(update(fit.complete, 'gu', envelope='min-max', plot = FALSE))
## 
##  #### phenopix time series processing ####
## 
## FITTING: GU
## 
## PREDICTED VALUES:
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3141  
##  1st Qu.:152.5   1st Qu.:0.3231  
##  Median :213.0   Median :0.3888  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.4033  
##  Max.   :336.0   Max.   :0.4046  
## 
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t - 
##     t02)/b2))^c2))
## 
## FITTING PARAMETERS:
##           y0           a1           a2          t01          t02 
##   0.06777627   0.81096539   0.89974985 126.24936560 303.55806891 
##           b1           b2           c1           c2 
##   3.59347356   1.27076400  19.21057048   0.05710031 
## 
## THRESHOLDS: GU  ENVELOPE:MIN-MAX
##                 UD            SD       DD       RD   maxline     baseline
## min  -7.405566e+18 -2.009143e+20 197.0000 209.7083 0.8065911 -0.024507113
## mean -2.406225e+17  6.183363e+18 232.8477 246.3947 0.8223564 -0.010615201
## max   5.363112e+18  2.408786e+20 287.0987 304.2033 0.8579503  0.005880146
##             prr         psr plateau.slope
## min  0.00000000 -0.07056026  -0.001126931
## mean 0.03212879 -0.06006440   0.003896021
## max  0.09137311 -0.03240724   0.010921226
## 
## UNCERTAINTY: TRUE
##  N of replications = 100 
## 
## HYDROLOGICAL DAY OF YEAR:

Beside the few options available by default and described above, the uncertainty data.frame is accessible via the extract command, by running:

extract(fit.complete, 'metrics.uncert') ## get threshold uncertainty data`
extract(fit.complete, 'params.uncert') ## get parameters of each fitting curve`

For example, if you want to use phenophases extracted from the true model and construct uncertainty envelope on them, you can access the uncertainty data.frame by the commands given above. Note than when the uncertainty is computed, also the plot function changes its behavior, in that it also shows the uncertainty curve ensemble and an error bar on extracted phases (fig.10).

plot(fit.complete, type='p', pch=20)

Figure 11: The Uncertainty Estimation (100 rep) on Klosterman fit and Gu phenophases

The distribution of uncertainty parameters (phenophases + curve parameters) can also be evaluated by means of box-plots with an extra option to the default plot method:

plot(fit.complete, what='thresholds')

By using the update function you can also extract phenophases according to a different method, without refitting the data. Here is the code:

update(fit.complete, 'klosterman', plot=FALSE)
## 
##  #### phenopix time series processing ####
## 
## FITTING: GU
## 
## PREDICTED VALUES:
##      Index         predicted     
##  Min.   : 92.0   Min.   :0.3141  
##  1st Qu.:152.5   1st Qu.:0.3231  
##  Median :213.0   Median :0.3888  
##  Mean   :213.5   Mean   :0.3687  
##  3rd Qu.:274.5   3rd Qu.:0.4033  
##  Max.   :336.0   Max.   :0.4046  
## 
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t - 
##     t02)/b2))^c2))
## 
## FITTING PARAMETERS:
##           y0           a1           a2          t01          t02 
##   0.06777627   0.81096539   0.89974985 126.24936560 303.55806891 
##           b1           b2           c1           c2 
##   3.59347356   1.27076400  19.21057048   0.05710031 
## 
## THRESHOLDS: KLOSTERMAN  ENVELOPE:QUANTILES
##     Greenup Maturity Senescence Dormancy
## 10%     126      149        281      303
## 50%     128      150        281      305
## 90%     129      150        282      306
## 
## UNCERTAINTY: TRUE
##  N of replications = 100 
## 
## HYDROLOGICAL DAY OF YEAR:

Phenophase extraction method trs allows to set an extra argument that controls which threshold in the trajectory be used. Default is when 50% of seasonal maximum gcc is reached (indicated as 0.5). Let’s see how it works:

extract(update(fit.complete, 
               'trs', 
               trs = 0.5, 
               plot=FALSE), 
        'metrics') # default to 50% of increasing
##     sos eos los   pop       mgs rsp rau      peak       msp       mau
## 10% 137 290 151 148.0 0.3956421  NA  NA 0.3979850 0.3576441 0.3542848
## 50% 138 290 152 210.0 0.3961202  NA  NA 0.3984602 0.3603773 0.3590862
## 90% 138 290 153 211.5 0.3964265  NA  NA 0.4008053 0.3616334 0.3600816
extract(update(fit.complete, 
               'trs', 
               trs=0.2, 
               plot=FALSE), 
        'metrics') # changed to 20%
##       sos eos   los   pop       mgs rsp rau      peak       msp       mau
## 10% 127.9 296 163.9 148.0 0.3912752  NA  NA 0.3979850 0.3312207 0.3373970
## 50% 128.0 296 168.0 210.0 0.3916978  NA  NA 0.3984602 0.3323248 0.3385821
## 90% 132.1 296 169.0 211.5 0.3927517  NA  NA 0.4008053 0.3418718 0.3394631

There is a last method to define thresholds on a time series that does not need a fitting. It implements the use of break points from the package strucchange and works as follows:

# uses `strucchange` package
par(mfrow = c(1,1))
print(phenopix::PhenoBP(x = filtered.data$max.filtered, breaks = 4, plot = TRUE, confidence= 0.99))

##              bp1        bp2        bp3
## 0.5%  1463724000 1472191200 1476597600
## mean  1463810400 1473746400 1476856800
## 99.5% 1463896800 1473832800 1476943200

The user can set the maximum number of breakpoints to be identified, the confidence interval at which the calculation must be performed and an option or a plot. The output dataframe contains the day of the year for each of the breakpoints and their respective confidence intervals.

Pushing forward the analysis: pixel - based phenology

In order to thoroughly exploit the capabilities of an imagery archive, spatial analysis represents the most promising feature. Hence, specific functions are built to fit curves and extract phenophases on each pixel included in a region of interest instead of averaging the greenness index over the entire ROI. A specific vignette of this package is devoted to explain details on the pixel-based analysis.

12 Other functions

A number of other functions are available in the package, that do not necessarily enter the main workflow of the processing but still may be worth to mention.

plotVI() gets in input a VI.data data.frame as produced by extractVIs and reproduces the default plots from extractVIs. Useful when you use extractVIs with argument begin switched on and you want to update existing plots. hydrodoy to convert from and to hydrological day of year, to be used in conjuction with greenProcess with hydro=TRUE

—– From Filippa et al. (2016) - Agr. and Forest Meteorology —–

Main functions

The typical work-flow of the phenopix package is summarised in the flowchart shown in Fig. 2. First, one (or more) region(s) of interest (ROI) is (are) chosen, then digital colour numbers are extracted from the ROI of each image, and processed to obtain a seasonal time series. After filtering the time series, data are fitted with either a double logistic equation or a smoothing curve, on which phenological thresholds (phenophases) are extracted. Finally, uncertainty of the fit and of phenophases can be computed.

Regions of interest (ROIs)

The scene of the picture rarely includes only the targeted vegetation canopy, thus the user will want to choose a particular region within the scene for analysis. Even more often, more than one region may be of interest, for example in a mixed forest one might independently analyse different deciduous species and evergreen trees (e.g. Ahrends et al., 2008). The function DrawROI() allows the user to draw one or more regions of interest on-screen, using the mouse cursor on a chosen reference picture.

The arguments for the DrawROI() function are explained in the help package. However, a little clarity may be helpful. DrawROI(path_img_ref, path_ROIs, nroi = 1, roi.names, file.type=‘.jpg’) * path_img_ref refers to the name of the image that you want to explore. You do not need to add full path if the called image is in the working directory.
* path_ROIs refers to the folder where you want to save the ROI.
* nroi refers to the number of ROIs that you want to define in the image.
* roi.names refers to the name that you want to call your new ROI

DrawROI("nwt-bowman.jpg", path_ROIs = "/Users/elwoodk/Google Drive/ElwoodK_Research/Computation/Phenopix/ROIs", roi.names = "roi.test2", nroi = 1, file.type = ".jpg") # Click on vertices. Use 'Esc' to finish. 

DrawROI(path_img_ref = "nwt-sena.jpg", path_ROIs = "/Users/elwoodk/Google Drive/ElwoodK_Research/Computation/Phenopix/ROIs/roi.test")

Extract vegetation indices

From the digital colour values of each image the green chromatic coordinate \(G_{CC}\) is computed. \(G_{CC}\) is a vegetation index derived from photographic image and quantifies the greenness relative to the total brightness. \(G_{CC}\) is computed as follows:

\(G_{CC} = \frac{G_{DN}}{R_{DN} + G_{DN} + B_{DN}}\)

where \(G_{DN}\), \(R_{DN}\), \(B_{DN}\) are the green, red, and blue digital numbers,respectively (Gillespie et al., 1987). Similarly, chromatic coordinates of red and blue (\(R_{CC}\) and \(B_{CC}\)) are also computed. Several indices based on RGB colours have been developed in the last years, including for example the green excess index (GEI) (Woebbeckeet al., 1995; Mizunuma et al., 2013). Some authors also used a combination of \(G_{CC}\) and \(R_{CC}\) to extract autumn phenophases (e.g. Klosterman et al., 2014). For simplicity, all subsequent analyses will be focused on \(G_{CC}\) but phenopix allows the analysis of the whole variety of colour indices, including the computation of new ones.

The function extractVIs() extracts raw red, green and blue digital numbers from each pixel in the ROI, and computes colour chromatic coordinates as in Eq.(1). Vegetation indices can be computed on ROIs with 2 approaches (Fig. 2): (1) the ROI-averaged approach: colour chromatic coordinates are computed for each pixel and then averaged over the whole ROI, and (2) the pixel-based approach, where each pixel belonging to the ROI is analysed separately (Section 5). The procedure is repeated for each image in the archive. A time-stamp is retrieved from the file name of the image and a time series of the computed indices is returned. Specific rules must be followed in naming the image files, and the reader is referred to the package help pages for more details.

vignette('phenopix')